{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "backpropagation_math.ipynb",
      "provenance": [],
      "collapsed_sections": [],
      "authorship_tag": "ABX9TyPZkGP0fmiMLfSo1A07eGJX",
      "include_colab_link": true
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "view-in-github",
        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/github/SummerLife/EmbeddedSystem/blob/master/MachineLearning/gist/backpropagation_math.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "VUpT82UKv77p",
        "colab_type": "text"
      },
      "source": [
        "# Mathematical Observations For Backpropagation\n",
        "\n",
        "## Loss $C_{0}$\n",
        "\n",
        "the squared difference of the activation output and the desired output for node \n",
        "$j$ in the output layer $L$. This can be interpreted as the loss for node $j$ in layer $L$.\n",
        "\n",
        "$$\\left( a_{j}^{(L)}-y_{j}\\right) ^{2}$$\n",
        "\n",
        "Therefore, to calculate the total loss, we should sum this squared difference for each node $j$ in the output layer $L$.\n",
        "\n",
        "This is expressed as\n",
        "\n",
        "$$C_{0}=\\sum_{j=0}^{n-1}\\left( a_{j}^{(L)}-y_{j}\\right) ^{2}\\text{.}$$\n",
        "\n",
        "## Input $z_{j}^{(l)}$\n",
        "\n",
        "We know that the input for node $j$ in layer $l$ is the weighted sum of the activation outputs from the previous layer $l$ − $1$.\n",
        "\n",
        "An individual term from the sum looks like this:\n",
        "\n",
        "$$w_{jk}^{(l)}a_{k}^{(l-1)}$$\n",
        "\n",
        "the input for a given node $j$ in layer $l$ is expressed as\n",
        "\n",
        "\n",
        "$$z_{j}^{(l)}=\\sum_{k=0}^{n-1}w_{jk}^{(l)}a_{k}^{(l-1)}\\text{.}$$\n",
        "\n",
        "## Activation Output \n",
        "\n",
        "the activation output of node $j$ in layer $l$ is expressed as\n",
        "\n",
        "$$a_{j}^{(l)}=g^{\\left( l\\right) }\\left( z_{j}^{\\left( l\\right) }\\right) \\text{.}$$\n",
        "\n",
        "## Expressing $C_{0}$ As A Composition Of Functions\n",
        "\n",
        "Recall the definition of $C_{0}$:\n",
        "\n",
        "$$C_{0}=\\sum_{j=0}^{n-1}\\left( a_{j}^{(L)}-y_{j}\\right) ^{2}\\text{.}$$\n",
        "\n",
        "the loss of a single node $j$ in the output layer $L$ can be expressed as\n",
        "\n",
        "$$C_{0_{j}}=\\left( a_{j}^{(L)}-y_{j}\\right) ^{2}\\text{.}$$\n",
        "\n",
        "we can express $C_{0_{j}}$ as a function of $a_{j}^{\\left( L\\right) }$ as\n",
        "\n",
        "$$C_{0_{j}}\\left( a_{j}^{\\left( L\\right) }\\right) \\text{.}$$\n",
        "\n",
        "The activation output of node $j$ in the output layer $L$ is a function of the input for node $j$. From an earlier observation, we know we can express this as\n",
        "\n",
        "$$a_{j}^{(L)}=g^{\\left( L\\right) }\\left( z_{j}^{\\left( L\\right) }\\right) \\text{.}$$\n",
        "\n",
        "The input for node $j$ is a function of all the weights connected to node $j$. We can express $z_{j}^{\\left( L\\right) }$ as a function of $w_{j}^{\\left( L\\right) }$ as\n",
        "\n",
        "$$z_{j}^{\\left( L\\right) }\\left( w_{j}^{\\left( L\\right) }\\right) \\text{.}$$\n",
        "\n",
        "We can see that $C_{0}$ is a composition of functions.\n",
        "\n",
        "![image.png](https://github.com/SummerLife/EmbeddedSystem/raw/master/Articles/figures/bp_composition_of_functions.jpg)\n",
        "\n",
        "And we know that\n",
        "\n",
        "$$C_{0}=\\sum_{j=0}^{n-1}C_{0_{j}}\\text{}$$\n",
        "\n",
        "We observe that the total loss of the networkk for a single input is also a composion of function. This is useful in order to understand how to differentiate $C_{0}$.\n",
        "\n"
      ]
    }
  ]
}